Selective Sequential Model Selection

نویسندگان

  • William Fithian
  • Jonathan Taylor
  • Robert Tibshirani
  • Ryan J. Tibshirani
چکیده

Many model selection algorithms produce a path of fits specifying a sequence of increasingly complex models. Given such a sequence and the data used to produce them, we consider the problem of choosing the least complex model that is not falsified by the data. Extending the selected-model tests of Fithian et al. (2014), we construct p-values for each step in the path which account for the adaptive selection of the model path using the data. In the case of linear regression, we propose two specific tests, the max-t test for forward stepwise regression (generalizing a proposal of Buja and Brown (2014)), and the next-entry test for the lasso. These tests improve on the power of the saturated-model test of Tibshirani et al. (2014), sometimes dramatically. In addition, our framework extends beyond linear regression to a much more general class of parametric and nonparametric model selection problems. To select a model, we can feed our single-step p-values as inputs into sequential stopping rules such as those proposed by G’Sell et al. (2013) and Li and Barber (2015), achieving control of the familywise error rate or false discovery rate (FDR) as desired. The FDR-controlling rules require the null p-values to be independent of each other and of the non-null p-values, a condition not satisfied by the saturated-model p-values of Tibshirani et al. (2014). We derive intuitive and general sufficient conditions for independence, and show that our proposed constructions yield independent p-values.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Applying Combined Approach of Sequential Floating Forward Selection and Support Vector Machine to Predict Financial Distress of Listed Companies in Tehran Stock Exchange Market

Objective: Nowadays, financial distress prediction is one of the most important research issues in the field of risk management that has always been interesting to banks, companies, corporations, managers and investors. The main objective of this study is to develop a high performance predictive model and to compare the results with other commonly used models in financial distress prediction M...

متن کامل

Sequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR

Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as: GA, PSO, ACO, SA and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR f...

متن کامل

Sequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR

Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as: GA, PSO, ACO, SA and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR f...

متن کامل

Toward Transparent Selective Sequential Consistency in Distributed Shared Memory Systems

This paper proposes a transparent selective sequential consistency approach to Distributed Shared Memory (DSM) systems. First, three basic techniques | time selection, processor selection, and data selection { are analyzed for improving the performance of strictly sequential consistency DSM systems, and a transparent approach to achieving these selections is proposed. Then, this paper focuses o...

متن کامل

Asymptotic properties of the sample mean in adaptive sequential sampling with multiple selection criteria

‎We extend the method of adaptive two-stage sequential sampling to‎‎include designs where there is more than one criteria is used in‎‎deciding on the allocation of additional sampling effort‎. ‎These‎‎criteria‎, ‎or conditions‎, ‎can be a measure of the target‎‎population‎, ‎or a measure of some related population‎. ‎We develop‎‎Murthy estimator for the design that is unbiased estimators for‎‎t...

متن کامل

Using selective sequential extraction techniques to evaluate tendency of soil fractions in Cd removal by Fe3O4 nanoparticles in continuous flow system

Use of nanotechnology has proven to be a promising approach toward remediation of all phases of environment. The aim of this work is to investigate the effects of different parameters on using iron III oxide nanoparticles in a continuous flow configuration for the removal of Cd2+ ionsfrom contaminated soils. Also selective sequential extraction tests are carried out to evaluate the nanoparticle...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015